Search CORE

34 research outputs found

Algorithms for CVaR Optimization in MDPs

Author: Chow Yinlam
Ghavamzadeh Mohammad
Publication venue
Publication date: 10/07/2014
Field of study

In many sequential decision-making problems we may want to manage risk by minimizing some measure of variability in costs in addition to minimizing a standard criterion. Conditional value-at-risk (CVaR) is a relatively new risk measure that addresses some of the shortcomings of the well-known variance-related risk measures, and because of its computational efficiencies has gained popularity in finance and operations research. In this paper, we consider the mean-CVaR optimization problem in MDPs. We first derive a formula for computing the gradient of this risk-sensitive objective function. We then devise policy gradient and actor-critic algorithms that each uses a specific method to estimate this gradient and updates the policy parameters in the descent direction. We establish the convergence of our algorithms to locally risk-sensitive optimal policies. Finally, we demonstrate the usefulness of our algorithms in an optimal stopping problem.Comment: Submitted to NIPS 1

arXiv.org e-Print Archive

CiteSeerX

Online Modified Greedy Algorithm for Storage Control under Uncertainty

Author: Chow Yinlam
Qin Junjie
Rajagopal Ram
Yang Jiyan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/06/2015
Field of study

This paper studies the general problem of operating energy storage under uncertainty. Two fundamental sources of uncertainty are considered, namely the uncertainty in the unexpected fluctuation of the net demand process and the uncertainty in the locational marginal prices. We propose a very simple algorithm termed Online Modified Greedy (OMG) algorithm for this problem. A stylized analysis for the algorithm is performed, which shows that comparing to the optimal cost of the corresponding stochastic control problem, the sub-optimality of OMG is bounded and approaches zero in various scenarios. This suggests that, albeit simple, OMG is guaranteed to have good performance in some cases; and in other cases, OMG together with the sub-optimality bound can be used to provide a lower bound for the optimal cost. Such a lower bound can be valuable in evaluating other heuristic algorithms. For the latter cases, a semidefinite program is derived to minimize the sub-optimality bound of OMG. Numerical experiments are conducted to verify our theoretical analysis and to demonstrate the use of the algorithm.Comment: 14 page version of a paper submitted to IEEE trans on Power System

arXiv.org e-Print Archive

CiteSeerX

Distributed Online Modified Greedy Algorithm for Networked Storage Operation under Uncertainty

Author: Chow Yinlam
Qin Junjie
Rajagopal Ram
Yang Jiyan
Publication venue
Publication date: 03/11/2014
Field of study

The integration of intermittent and stochastic renewable energy resources requires increased flexibility in the operation of the electric grid. Storage, broadly speaking, provides the flexibility of shifting energy over time; network, on the other hand, provides the flexibility of shifting energy over geographical locations. The optimal control of storage networks in stochastic environments is an important open problem. The key challenge is that, even in small networks, the corresponding constrained stochastic control problems on continuous spaces suffer from curses of dimensionality, and are intractable in general settings. For large networks, no efficient algorithm is known to give optimal or provably near-optimal performance for this problem. This paper provides an efficient algorithm to solve this problem with performance guarantees. We study the operation of storage networks, i.e., a storage system interconnected via a power network. An online algorithm, termed Online Modified Greedy algorithm, is developed for the corresponding constrained stochastic control problem. A sub-optimality bound for the algorithm is derived, and a semidefinite program is constructed to minimize the bound. In many cases, the bound approaches zero so that the algorithm is near-optimal. A task-based distributed implementation of the online algorithm relying only on local information and neighbor communication is then developed based on the alternating direction method of multipliers. Numerical examples verify the established theoretical performance bounds, and demonstrate the scalability of the algorithm.Comment: arXiv admin note: text overlap with arXiv:1405.778

arXiv.org e-Print Archive

CiteSeerX